Multi - processor FFT
نویسندگان
چکیده
Computing the Fast Fourier Transform on a distributed memory architecture by a direct pipelined radix-2 algorithm, a bi-section or multi-section algorithm, all yield the same communications requirement, if communication for all FFT stages can be performed concurrently, the input data is in normal order, and the data allocation consecutive. With a cyclic data allocation, or bit-reversed input data and a consecutive allocation, multi-sectioning o ers a reduced communications requirement by approximately a factor of two. For a consecutive data allocation, normal input order, a decimation-in-time FFT requires that P N + d 2 twiddle factors be stored for P elements distributed evenly over N processors, and the axis subject to transformation distributed over 2 d processors. No communication of twiddle factors is required. The same storage requirements hold for a decimation-in-frequency FFT, bit-reversed input order, and consecutive data allocation. The opposite combination of FFT type and data ordering requires a factor of log 2 N more storage for N processors. The peak performance for a Connection Machine system CM-200 implementation is 12.9 G ops/s in 32-bit precision, and 10.7 G ops/s in 64-bit precision for unordered transforms local to each processor. The corresponding execution rates for ordered transforms are 11.1 G ops/s and 8.5 G ops/s, respectively. For distributed oneand two-dimensional transforms the peak performance for unordered transforms exceeds 5 G ops/s in 32-bit precision, and 3 G ops/s in 64-bit precision. Three-dimensional transforms executes at a slightly lower rate. Distributed ordered transforms executes at a rate of about 1 2 to 2 3 of the unordered transforms.
منابع مشابه
Design and Implementation of 1-D and 2-D Mixed Architecture FFT Processor in Heterogeneous Multi-core SoC based on FPGA
A novel architecture FFT processor which can carry on 1-D FFT algorithm or 2-D FFT algorithm corresponding different size of FFT is proposed in this paper. The architecture is served as a scalable IP Core which is suitable for the heterogeneous multi-core SoC on chip application. The mixed architecture FFT processor achieves balance between high processing speed and resources. Compared with a c...
متن کاملVariable-length FFT Processor Architecture
In this paper, we proposed an variable length FFT processor architecture which could be used in multi-mode and multi-standard OFDM communication system. Based on radix-2 single-path delay feedback architecture, FFT is implemented with the aid of LUT and control logic. The expected architecture would complete FFT ranging 2048/1024/512point. Keywords-FFT; Radix-2 FFT; variable length FFT;
متن کاملArea-efficient FFT processor for MIMO-OFDM based SDR systems
In this letter, an area-efficient FFT processor is proposed for MIMO-OFDM based SDR systems. The proposed FFT processor can support variable lengths of 64, 128, 256, 512, 1024, 1536 and 2048. By reducing the required number of non-trivial multipliers with a mixed-radix algorithm, the complexity of the proposed FFT processor is dramatically decreased. The proposed FFT processor was designed in a...
متن کاملA high-speed low-complexity modified radix-25 FFT processor for gigabit WPAN applications
Abstract— In this paper, we present a novel modified radix-2 algorithm for 512-point fast Fourier transform (FFT) computation and high-speed eight-parallel data-path architecture for multi-gigabit wireless personal area network (WPAN) systems. The proposed FFT processor can provide a high data throughput and low hardware complexity by using eight-parallel data-path and multi-path delay-feedback...
متن کاملDesign of 2w4wsk-point Fft Processor Based on Cordic Algorithm in Ofdm Receiver
In this paper, the architecture and the implementation of a 2K/4K/SK-point complex fast Fourier transform (FFT) processor for OFDM system are presented. The processor can perform 8K-point FFT every 273p , and 2K-point every 68.26~s at 30MHz which is enough for OFDM symbol rate. The architecture is based on the Cooley-Tukey algorithm for decomposing the long DFT into short length multi-dimension...
متن کاملA Novel Architecture for Radix - 4 Pipelined FFT Processor using Vedic Mathematics Algorithm
The FFT processor is a critical block in all multi-carrier systems used primarily in the mobile environment. The portability requirement of these systems is mainly responsible for the need of low power FFT architectures. In this study, an efficient addressing scheme for radix-4 64 point FFT processor is presented. It avoids the modulo-r addition in the address generation; hence, the critical pa...
متن کامل